7/22/2018

Introduction

This web page presentation is part of the Developing Data Products course on Coursera.org. It was created using R Markdown and featured a plot created with Plotly.

In this presentation, we are going to use the diamonds dataset from ggplot2 package to plot the relationship between price and the weight of the diamonds, categorized by the quality of the cut.

Let's have a quick look at our dataset

diamonds = diamonds[,c("carat","cut","price")]
summary(diamonds)
##      carat               cut            price      
##  Min.   :0.2000   Fair     : 1610   Min.   :  326  
##  1st Qu.:0.4000   Good     : 4906   1st Qu.:  950  
##  Median :0.7000   Very Good:12082   Median : 2401  
##  Mean   :0.7979   Premium  :13791   Mean   : 3933  
##  3rd Qu.:1.0400   Ideal    :21551   3rd Qu.: 5324  
##  Max.   :5.0100                     Max.   :18823

Since the dataset is pretty big, we are going to take only 3,000 sample records to represent our entire dataset.

diamonds = sample_n(diamonds, 3000)

Diamond's weight and cut have positive relationship with its price